Norwegian University of Science and Technology Technical Report IDI-TR-8/2010 Exploiting Time-based Synonyms in Searching Document Archives

نویسندگان

  • Nattiya Kanhabua
  • Kjetil Nørvåg
چکیده

Recently a large number of easily accessible information resources have become available. To increase search quality, document creation time can be taken into account in order to increase precision, and query expansion of named entities can be employed in order to increase recall. A peculiarity of named entities compared to other vocabulary terms is that they are very dynamic in appearance, and synonym relationships between terms changes with time. In this paper, we present an approach to extract synonyms of named entities over time from the whole history of Wikipedia. In addition, we will use their temporal patterns as a feature in ranking and classifying them into two types, i.e., time-independent or time-dependent. Time-independent synonyms are invariant to time, while time-dependent synonyms are relevant to a particular time period, i.e., the synonym relation changes over time. Further, we describe how to make use of both types of synonyms in order to increase the retrieval effectiveness (precision and recall), i.e., query expansion with time-independent synonyms for an ordinary search, and query expansion with timedependent synonyms for a search wrt. temporal criteria. Finally, through an evaluation based on TREC collections we demonstrate how retrieval performance of queries consisting of named entity can be improved using our approach.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Norwegian University of Science and Technology Technical report IDI-TR-11/2002 Supporting Temporal Text-Containment Queries

In temporal document databases and temporal XML databases, temporal text-containment queries are a potential performance bottleneck. In this paper we describe how to manage documents and index structures in such databases in way that makes temporal text-containment querying feasible. We describe and discuss different index structures that can improve such queries. Three of the alternatives have...

متن کامل

Norwegian University of Science and Technology Technical Report IDI-TR-09/2007 Semantic-Based Association Rule Mining of Temporal Document Collections

In many contexts today we have documents available in a number of versions. In addition to explicit knowledge that can be queried/searched in documents, these documents also contain implicit knowledge that can be found by text mining. In this paper we will study association rule mining of temporal document collections, and extend our previous work by 1) performing mining based on semantics as w...

متن کامل

Norwegian University of Science and Technology Technical report IDI-TR-X/2002, last revised: 2002-09-02 V2: A Database Approach to Temporal Document Management

The advent of large amounts of data on the web has closed the gap between the document storage and database communities. In this paper, this work is continued by the description of the foundations for temporal document databases. We describe the V2 temporal document database, which supports storage, retrieval, and querying of temporal documents. We describe functionality and operations/operator...

متن کامل

Norwegian University of Science and Technology Technical report IDI-TR-10/2002 Design, Implementation, and Performance of the V2 Temporal Document Database System

The advent of large amounts of data on the web has closed the gap between the document storage and the database communities. In this paper, this work is continued by the description of the foundations for temporal document databases. We describe functionality and operations/operators to be supported by such systems, and more specifically we describe the architecture for management of temporal d...

متن کامل

Norwegian University of Science and Technology Technical Report IDI-TR-1/2003 Algorithms for Granularity Reduction in Temporal Document Databases

With rapidly decreasing storage costs temporal document databases is now a viable solution in many contexts. However, storing an ever growing database can still be too costly, and as a consequence it is desirable to be able to physically delete old versions. Traditionally, this has been performed by an operation called vacuuming, where the oldest versions are physically deleted (or migrated fro...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010